Multiagent Q-Learning by Context-Specific Coordination Graphs
نویسندگان
چکیده
One of the main problems in cooperative multiagent learning is that the joint action space is exponential in the number of agents. In this paper, we investigate a sparse representation of the joint action space in which value rules specify the coordination dependencies between the different agents for a particular state. Each value rule has an associated payoff which is part of the global Q-function. We will discuss a Qlearning method that updates these context-specific rules based on the optimal joint action found with the coordination graph algorithm. We apply our method to the pursuit domain and compare it with other multiagent reinforcement learning methods.
منابع مشابه
Multiagent Coordination in Cooperative Q-learning Systems
Many reinforcement learning architectures fail to learn optimal group behaviors in the multiagent domain. Although these coordination difficulties are often attributed to the non-Markovian environment created by the gradually-changing policies of concurrently learning agents, a careful analysis of the situation reveals an underlying problem structure which can cause suboptimal group policies ev...
متن کاملGraphical models in continuous domains for multiagent reinforcement learning
In this paper we test two coordination methods – difference rewards and coordination graphs – in a continuous, multiagent rover domain using reinforcement learning, and discuss the situations in which each of these methods perform better alone or together, and why. We also contribute a novel method of applying coordination graphs in a continuous domain by taking advantage of the wire-fitting ap...
متن کاملReinforcement Learning in Multi-agent Games
This article investigates the performance of independent reinforcement learners in multiagent games. Convergence to Nash equilibria and parameter settings for desired learning behavior are discussed for Q-learning, Frequency Maximum Q value (FMQ) learning and lenient Q-learning. FMQ and lenient Q-learning are shown to outperform regular Q-learning significantly in the context of coordination ga...
متن کاملLabeled Initialized Adaptive Play Q-learning for Stochastic Games
Recently, initial approximation of Q-values of the multiagent Q-learning by the optimal single-agent Q-values has shown good results in reducing the complexity of the learning process. In this paper, we continue in the same vein and give a brief description of the Initialized Adaptive Play Q-learning (IAPQ) algorithm while establishing an effective stopping criterion for this algorithm. To do t...
متن کاملUtile Coordination: Learning Interdependencies Among Cooperative Agents
We describe Utile Coordination, an algorithm that allows a multiagent system to learn where and how to coordinate. The method starts with uncoordinated learners and maintains statistics on expected returns. Coordination dependencies are dynamically added if the statistics indicate a statistically significant benefit. This results in a compact state representation because only necessary coordina...
متن کامل